The Global Health Expenditure Database (GHED) provides time series data from 2000-2022 which covers country-level spending in the world. Healthcare in the USA is notably more expensive than its counter-parts across the world. This dataset will allow us to examine how different countries allocate their funds towards different segments of healthcare. For example, we may be able to explore how much spending is being allocated towards injuries, for example.
To explore the given dataset, we will look at three subsets in attempts to uncover interesting potential leads. The motivation behind this stems from the large presence of missing values that are in the dataset. We will demonstrate this below. Furthermore, variable names are not intuitive. Creating more meaningful subsets of the larger dataset will allow us to explore potentially interesting questions.
Before delving into analysis, we will focus on the purpose of GHED data. The GHED data is designed to support policy development, comparative analysis, sustainable development goal monitoring, and for advocacy. For instance, the granular data that is available on a standardized world-wide scale, allows UN policy makers to allocate funds more appropriately. Furthermore, officials are able to identify how, for example, aid funds are being spent within a country. This is critical for ensuring that, especially in the future, funds are allocated in the most effective way possible.
We build upon these goals to monitor one specific indicator (OOP costs) in everyday life. However, data is comprised of reports from Ministries of Health, national statistical offices, and other government bodies. Thus, internal bias within the country may influence the data. Regardless, we need to look at how funds are being spent by the average person across different countries in order to improve overall quality of life. While we don’t have costs broken down by household, we believe that OOP costs on the scale of a country serves as the best available proxy in our data for everyday expenses.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
GHED_data <- read_excel("GHED_data.XLSX", sheet = 1)
# View the structure of the dataset
dim(GHED_data)
## [1] 4244 3923
head(GHED_data)
## # A tibble: 6 × 3,923
## country code region income year che_gdp che_pc_usd hk_gdp hk_g_gdp
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria DZA AFR Lower-middle 2000 3.49 62.1 NA NA
## 2 Algeria DZA AFR Lower-middle 2001 3.84 67.3 NA NA
## 3 Algeria DZA AFR Lower-middle 2002 3.73 66.9 NA NA
## 4 Algeria DZA AFR Lower-middle 2003 3.60 76.2 NA NA
## 5 Algeria DZA AFR Lower-middle 2004 3.54 93.0 NA NA
## 6 Algeria DZA AFR Lower-middle 2005 3.24 101. NA NA
## # ℹ 3,914 more variables: hk_ext_gdp <dbl>, che <dbl>, gghed <dbl>, pvtd <dbl>,
## # ext <dbl>, dom_che <dbl>, gghed_che <dbl>, pvtd_che <dbl>, oops_che <dbl>,
## # vpp_che <dbl>, ext_che <dbl>, gghed_gdp <dbl>, gghed_gge <dbl>,
## # gghed_pc_usd <dbl>, pvtd_pc_usd <dbl>, oop_pc_usd <dbl>, ext_pc_usd <dbl>,
## # tran_shi <dbl>, shise_shi <dbl>, cfa_che <dbl>, gfa_che <dbl>,
## # chi_che <dbl>, shi_che <dbl>, chi_pvt_che <dbl>, vfa_che <dbl>,
## # vhi_che <dbl>, row_che <dbl>, phc_usd_pc <dbl>, phc_che <dbl>, …
We can notice that the size of the datasset is 4,224 x 3,923, representing a large number of both observations and variables. The dataset does not provide much value as it is. Additionally, we can begin to see the missing value problem that was previously mentioned, particularly in columns 8-10.
GHED_data_filtered <- GHED_data %>%
filter(year == 2021)
len <- length(GHED_data)
missing_values <- as.data.frame(colSums(is.na(GHED_data_filtered)))
colnames(missing_values) <- "n"
missing_values$n <- missing_values$n / len
ggplot(missing_values, aes(x = n)) +
geom_histogram() +
labs(title = "Proportion of Missing Values in 2021", x = "proportion")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In just one year, the proportion of rows that have missing values vary greatly, ranging from 0% - 5%. From here on, we will begin to explore subsets of the data. The following three ideas will be investigated.
To begin this section, we will subset a section of data to use, as well as renaming certain variables to make it more readable and understandable to the viewers. This section is to look into how each countries have spent on specific disease as a percentage of their total health expenditure.
In this section we will take a look at: - Infectious and Parasitic Diseases - HIV/Aids and Sexually transmitted diseases - Tuberculosis - Malaria - Neglected Tropical Diseases - Reproductive Health - Maternal Conditions - Perinatal Conditions - Contraceptive Management - Nutritional Deficiencies - Injuries - Other unspecified diseases/conditions
For this subset we will choose the year of 2019 as the baseline as it represents the most recent snapshot of pre-COVID medical expenses
disease_dataset <- GHED_data %>%
rename(
infectious = dis1_che,
hiv_aids = dis11_che,
tuberculosis = dis12_che,
malaria = dis13_che,
neg_tropical = dis16_che,
reproductive = dis2_che,
maternal = dis21_che,
perinatal = dis22_che,
contraceptive = dis23_che,
nutrition = dis3_che,
noncomm = dis4_che,
injuries = dis5_che,
other = disnec_che
) %>%
select(
country,
region,
year,
hiv_aids,
tuberculosis,
malaria,
neg_tropical,
reproductive,
maternal,
perinatal,
contraceptive,
nutrition,
injuries,
other
)
head(disease_dataset)
## # A tibble: 6 × 14
## country region year hiv_aids tuberculosis malaria neg_tropical reproductive
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Algeria AFR 2000 NA NA NA NA NA
## 2 Algeria AFR 2001 NA NA NA NA NA
## 3 Algeria AFR 2002 NA NA NA NA NA
## 4 Algeria AFR 2003 NA NA NA NA NA
## 5 Algeria AFR 2004 NA NA NA NA NA
## 6 Algeria AFR 2005 NA NA NA NA NA
## # ℹ 6 more variables: maternal <dbl>, perinatal <dbl>, contraceptive <dbl>,
## # nutrition <dbl>, injuries <dbl>, other <dbl>
When looking through the current subset, it seems that there are areas that are missing information or may contain rows that only have limited information. In order to be able to provide a detailed analysis for this section we will need to clean the dataset.
disease_filtered <- subset(disease_dataset, year %in% c(2019))
str(disease_filtered)
## tibble [192 × 14] (S3: tbl_df/tbl/data.frame)
## $ country : chr [1:192] "Algeria" "Angola" "Benin" "Botswana" ...
## $ region : chr [1:192] "AFR" "AFR" "AFR" "AFR" ...
## $ year : num [1:192] 2019 2019 2019 2019 2019 ...
## $ hiv_aids : num [1:192] NA NA NA 20.6 3.77 ...
## $ tuberculosis : num [1:192] NA NA NA 3.386 0.235 ...
## $ malaria : num [1:192] NA NA NA 5.79 15.22 ...
## $ neg_tropical : num [1:192] NA NA NA 1.24 4.88 ...
## $ reproductive : num [1:192] NA NA NA 10 14.5 ...
## $ maternal : num [1:192] NA NA NA 2.96 9.24 ...
## $ perinatal : num [1:192] NA NA NA NA 3.99 ...
## $ contraceptive: num [1:192] NA NA NA NA 1.28 ...
## $ nutrition : num [1:192] NA NA NA 1.739 0.725 ...
## $ injuries : num [1:192] NA NA NA 3.66 1.76 ...
## $ other : num [1:192] NA NA NA 0.0755 33.9561 ...
However, in this subset, there contains rows where there are only few amounts of data, and a few that are missing only a certain amount of data. For this, we will create a quick function to look through the specific columns listed below to check for the ones that we want to keep. We will keep the rows that contain less missing values than actual values that are in the row.
columns_to_check <- c("hiv_aids", "tuberculosis", "malaria", "neg_tropical", "reproductive", "maternal", "perinatal", "contraceptive", "nutrition", "injuries", "other")
clean_disease <- disease_filtered[rowSums(is.na(disease_filtered[columns_to_check])) < length(columns_to_check), ]
region_counts_df
## # A tibble: 6 × 2
## region Count
## <chr> <int>
## 1 AFR 36
## 2 AMR 2
## 3 EMR 2
## 4 EUR 6
## 5 SEAR 1
## 6 WPR 3
After cleaning the data, we will then divide the countries into regions. This will allow us to get a better snapshot of the countries as there are too many countries to be able to observe clearly.
As well, different regions require different amounts of spending, in regions like Africa where malaria is more commons spread, the spending would be much higher in comparison to regions like Europe where malaria is not as prevalent leading to less spending for that particular disease.
However, the amount of data collected by other regions is extremely low in comparison to the Africa region, this could point to a potential logistics issue or perhaps the government not willing to share information as freely.
As such, when looking at the graphs, we will separate the graphs by region, in order to have a better comparison for each region. Regions will differ on how much will be spent on different diseases as particular diseases may be more prevalent in some regions than most.
# Stacked Bar Chart of AFR Region
AFR_Region <- ggplot(clean_disease %>% filter(region == "AFR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (AFR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") + # Place legend at the bottom
scale_y_continuous(labels = scales::percent_format(scale = 1))
AFR_Region
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_bar()`).
In the Africa region it seems that the highest spending is on malaria, this can be explained by the mosquitos of that region having a longer life-span in comparison to those of other regions. Even though it is a disease that can be treated with care, in a region where it is hard to do so, prevents Malaria from being easily treated.
# Stacked Bar Chart of AMR Region
AMR_region <- ggplot(clean_disease %>% filter(region == "AMR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (AMR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
AMR_region
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_bar()`).
In the Americas, there is a huge lack of data in this area, which can be explained by the WHO keeping the United States data seperate from the data making it hard to analyze both at the same time. However the lack of data in the South of the Americas is suprising which could stem from logistics issues or other limiting factors.
# Stacked Bar Chart of EMR Region
EMR_region <- ggplot(clean_disease %>% filter(region == "EMR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (EMR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
EMR_region
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_bar()`).
For the EMR region it faces the same issue as the Americas, where there is a suprising lack of information. This could be stemmed to logistics issue where there are conflicts in countries inside the region such as Syria, Yemen, and Afghanistan. This could lead to medical issues in that region as well as not much access to treatment, making it hard for the people in those regions to report the data.
# Stacked Bar Chart of EUR Region
EUR_region <- ggplot(clean_disease %>% filter(region == "EUR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (EUR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
EUR_region
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_bar()`).
For the EUR region there is more readily available information elsewhere, perhaps from the European Union, which may lead to less reporting to the WHO. As well as having a good healthcare system, leading to few reports about some of the issues that other regions may face.
# Stacked Bar Chart of SEAR Region
SEAR_region <- ggplot(clean_disease %>% filter(region == "SEAR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (SEAR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
SEAR_region
In the South East Asian Region, this could stem from health system imitations: Many countries in SEAR have under-resourced health systems with limited capacity for comprehensive data collection. This can lead to inconsistent or incomplete reporting of health metrics, especially in rural or underserved areas. As well as, a high disease burden: The region experiences a high burden of infectious diseases (e.g., tuberculosis, dengue, malaria) and non-communicable diseases (e.g., diabetes, cardiovascular diseases). The focus on addressing immediate health challenges can deprioritize routine data collection efforts, affecting data quality.
# Stacked Bar Chart of WPR Region
WPR_region <- ggplot(clean_disease %>% filter(region == "WPR") %>%
pivot_longer(cols = c(hiv_aids, tuberculosis, malaria,
neg_tropical, reproductive, maternal,
perinatal, contraceptive, nutrition,
injuries, other),
names_to = "Metric",
values_to = "Percentage"),
aes(x = country, y = Percentage, fill = Metric)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Percentage of Health Metrics by Country (WPR Region)",
x = "Country",
y = "Percentage") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_y_continuous(labels = scales::percent_format(scale = 1))
WPR_region
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_bar()`).
The WPR region could face issues from a digital divide, some people may have limited access to digital health technologies, resulting in a digital divide that can impact data collection quality and timeliness.
Overall, with the lack of data in regions except for the Africa region, it is hard to know the specific disease spending across the regions. In future studies, having more data would lead to more conclusive analysis, as having a 1-2 countries for some regions cannot paint how disease spending is affected in that region as a whole. In the Africa region in particular we can see how spending on malaria is higher than others. The presence of only one or two countries representing some regions is insufficient to accurately depict how disease spending is distributed or influenced within those areas as a whole.
The lack of data raises questions as well, efforts should be made to include more countries from underrepresented regions to enhance the data set’s representatives. Collaborating with regional health agencies could help fill these gaps.
With more comprehensive data and advanced methodologies, future analysis could offer a clearer understanding of global and regional disease spending patterns.
To better understand how healthcare financing varies across regions and income levels, we conducted a series of analytical steps aimed at uncovering relationships, identifying patterns, and highlighting areas for potential policy intervention. Our approach focuses on key indicators like government health expenditure, out-of-pocket (OOP) costs, and total health expenditure as a share of GDP. Here’s a breakdown of our methodology and key findings
Selecting the relevant columns from the data set for the analysis and removing missing values in these columns. This step was essential in allowing for accurate comparisons across regions and income levels.
data <- GHED_data
subset_data <- data %>%
select(region, income, year, gghed_che, oops_che, che_gdp, che_pc_usd) %>%
filter(!is.na(gghed_che) & !is.na(oops_che) & !is.na(che_gdp) & !is.na(che_pc_usd))
We group the data by region and income to calculate the mean values of government health expenditure (gghed_che), out-of-pocket expenditure (oops_che), and overall health expenditure (che_gdp and che_pc_usd). This aggregation allowed for a clearer comparison of health financing across different economic and geographic contexts. By summarizing these variables, we aimed to highlight disparities in how healthcare is financed in high-income vs. low-income regions.
summary_stats <- subset_data %>%
group_by(region, income) %>%
summarise(
mean_gghed_che = mean(gghed_che, na.rm = TRUE),
mean_oops_che = mean(oops_che, na.rm = TRUE),
mean_che_gdp = mean(che_gdp, na.rm = TRUE),
mean_che_pc_usd = mean(che_pc_usd, na.rm = TRUE)
)
## `summarise()` has grouped output by 'region'. You can override using the
## `.groups` argument.
# Print the summary stats
print(summary_stats)
## # A tibble: 19 × 6
## # Groups: region [6]
## region income mean_gghed_che mean_oops_che mean_che_gdp mean_che_pc_usd
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AFR High 74.5 23.3 4.68 552.
## 2 AFR Low 22.7 41.1 5.56 30.1
## 3 AFR Lower-middle 36.1 41.3 4.73 83.5
## 4 AFR Upper-middle 46.1 29.7 5.44 332.
## 5 AMR High 53.2 30.8 7.70 2002.
## 6 AMR Lower-middle 43.7 39.2 6.68 143.
## 7 AMR Upper-middle 54.0 35.4 5.98 388.
## 8 EMR High 74.1 17.2 3.48 1037.
## 9 EMR Low 25.5 65.9 6.53 56.1
## 10 EMR Lower-middle 39.9 48.1 5.06 197.
## 11 EMR Upper-middle 55.6 37.5 5.36 223.
## 12 EUR High 72.7 20.0 8.01 3160.
## 13 EUR Lower-middle 40.6 54.3 6.08 90.3
## 14 EUR Upper-middle 50.9 45.0 6.52 332.
## 15 SEAR Lower-middle 35.4 49.2 3.77 56.9
## 16 SEAR Upper-middle 61.6 26.9 6.15 393.
## 17 WPR High 72.3 15.7 7.91 1986.
## 18 WPR Lower-middle 51.2 23.7 5.68 115.
## 19 WPR Upper-middle 56.9 17.8 8.20 459.
We calculate correlations between:
This step helps to see how government spending or individual costs relate to overall health expenditure.
There is a negative correlation of -0.738 between government expenditure and out-of-pocket expenditure, which suggests that in countries where the government spends more on healthcare, individuals are required to pay less out-of-pocket for medical services and treatments. Essentially, greater government funding reduces the financial burden on citizens.
There is a positive correlation of 0.229 between government expenditure and health expenditure as % of GDP, which suggests the more the government contributes to healthcare, the more the country as a whole tends to spend on health relative to the size of its economy.
There is a negative correlation of -0.351 between out-of-pocket expenditure and health expenditure as % of GDP suggests that countries or regions with higher health expenditure as a percentage of GDP are likely to have lower out-of-pocket spending by individuals.
correlation_analysis <- subset_data %>%
summarise(
correlation_gghed_oops = cor(gghed_che, oops_che, use = "complete.obs"),
correlation_gghed_che_gdp = cor(gghed_che, che_gdp, use = "complete.obs"),
correlation_oops_che_gdp = cor(oops_che, che_gdp, use = "complete.obs")
)
# Print the correlation analysis
print(correlation_analysis)
## # A tibble: 1 × 3
## correlation_gghed_oops correlation_gghed_che_gdp correlation_oops_che_gdp
## <dbl> <dbl> <dbl>
## 1 -0.739 0.229 -0.351
The scatter plot explores the relationship between government expenditure and out-of-pocket costs across income groups. A linear trend line is added to assess the general trend.
The analysis reveals that high-income regions have higher government healthcare expenditure and lower out-of-pocket costs compared to low-income regions, which exhibit the opposite trend. This reflects the disparity in healthcare financing structures and highlights the need for stronger public healthcare support in low-income areas
ggplot(subset_data, aes(x = gghed_che, y = oops_che, color = income)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Relationship between Government Expenditure and Out-of-Pocket Expenditure",
x = "Government Expenditure on Health (% of Current Health Expenditure)",
y = "Out-of-Pocket Expenditure (% of Current Health Expenditure)",
color = "Income Level"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
The bar chart shows average health expenditure per capita by income
level and region, providing a clear comparison of spending levels. The
drastic difference in average health expenditure per capita between
high-income level regions and all other three regions shows a concerning
reality of how wealth plays into an effect on health resource
allocation. This reinforces the need for more equitable allocation of
health resources to bridge the gap in healthcare access between
wealthier and poorer regions.
ggplot(summary_stats, aes(x = income, y = mean_che_pc_usd, fill = region)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Average Health Expenditure per Capita by Income Level and Region",
x = "Income Level",
y = "Average Health Expenditure per Capita (USD)",
fill = "Region"
) +
theme_minimal()
The line chart tracks government health expenditure as a % of current health expenditure over time by income level, allowing us to see trends across different income groups from 2000-2020.
Unsurprisingly, the high-income regions are consistently high in their expenditure on health at around 80% of CHE. Upper-middle regions have experienced a decrease in expenditure since 2020 and have since fallen to the same expenditure level as the lower-middle regions in 2022. The low-income regions are struggling to stay above their expenditure level at around 20% of CHE.
This trend highlights the growing need for targeted interventions to bolster government healthcare funding, especially in upper-middle- and low-income regions. Without greater public support, these regions may face increased reliance on out-of-pocket payments, worsening inequality.
ggplot(subset_data, aes(x = year, y = gghed_che, color = income)) +
geom_line(stat = "summary", fun = "mean") +
labs(
title = "Trend of Government Health Expenditure (% of CHE) by Income Group (2000-2020)",
x = "Year",
y = "Government Expenditure on Health (% of Current Health Expenditure)",
color = "Income Group"
) +
theme_minimal()
The box plot identifies outliers in out-of-pocket and government health expenditures across income levels and regions. This can help identify patterns and variances that warrant further investigation.
High-income regions: Outliers in the EUR and WPR regions suggest that even wealthy regions experience variability in OOP costs, underscoring the need to address areas of financial vulnerability.
Upper-middle-income regions: Outliers in the EMR region signal that even within this income group, there are countries with disproportionately high OOP costs, which may indicate gaps in healthcare access or government support.
ggplot(subset_data, aes(x = income, y = oops_che, fill = region)) +
geom_boxplot() +
labs(
title = "Out-of-Pocket Expenditure Distribution by Income Level and Region",
x = "Income Level",
y = "Out-of-Pocket Expenditure (% of Current Health Expenditure)"
) +
theme_minimal()
We used a box plot to visualize the distribution of government health
expenditure (GGHE-D) by region and income level. This analysis helps
identify regions that exhibit unusually high or low government
spending.
Key Insight: Outliers were found in all four income levels, with notable instances in:
High-income regions: Outliers in the EUR region may warrant investigation, as they suggest variability in how public funds are being allocated.
Low-income regions: Outliers in the AFR region highlight significant disparities within this group, suggesting uneven allocation of limited healthcare resources.
Lower-middle-income regions: Outliers in the AMR region point to potential underfunding in some countries, which may affect the affordability of healthcare services.
Upper-middle-income regions: Outliers in the AFR and WPR regions suggest the need for targeted funding policies to reduce disparities in healthcare financing.
ggplot(subset_data, aes(x = income, y = gghed_che, fill = region)) +
geom_boxplot() +
labs(
title = "Government Health Expenditure Distribution by Income Level and Region",
x = "Income Level",
y = "Government Health Expenditure (% of Current Health Expenditure)"
) +
theme_minimal()
The identification of outliers in both government and out-of-pocket expenditure reveals crucial areas for potential policy intervention. For instance:
High-income regions: While these regions generally exhibit strong government healthcare support, the outliers in the EUR and WPR regions suggest areas where regulatory oversight is needed to investigate why OOP costs are still significant. Low- and middle-income regions: Outliers in these regions emphasize the need for external support and improved public healthcare funding to reduce OOP costs. Targeted interventions could ensure more equitable healthcare access and better financial protection for citizens. These findings highlight the critical role of public healthcare funding in reducing individual financial burdens and ensuring fair access to healthcare. Policymakers should focus on regions with extreme outliers, as these may reflect areas where targeted funding or policy reforms are urgently needed.
The analysis of the GHED data sheds light on significant disparities in healthcare financing across regions and income levels. The relationships between government health expenditure, out-of-pocket costs, and total health expenditure reveal the protective role of public healthcare funding. Our visualizations—scatter plots, bar charts, line charts, and box plots—highlight critical insights and outliers, emphasizing the need for targeted policy interventions. The most pressing concern lies in lower-income regions, where insufficient public healthcare funding results in higher out-of-pocket costs for citizens. Future efforts should prioritize addressing these disparities to promote equitable access to healthcare and improve overall health outcomes.
To begin analyzing the data, we will select a subset of variables and rename them in order to make downstream analysis easier. Displayed below is a subset of the new dataset which we will use to explore our question. The goal is to look at the proportion of OOP healthcare costs vs. social insurance contributions across different countries.
The scope of this analysis will include all countries with valid data. We will begin with looking at all years, and eventually narrow down to 2019. We choose 2019 as a baseline as it represents the most recent snapshot of pre-COVID medical expenses.
# Rename variables to have intuitive names
OOP_subset <- GHED_data %>%
rename(
current_health_expenditure_pct_gdp = che_gdp,
gdp_per_capita_usd = gdp_pc_usd,
primary_health_care_expenditure_pct_che = phc_che,
total_expenditure = hf,
govt_and_compulsory_financing = hf1,
government_schemes = hf11,
compulsory_contributory_health_insurance = hf12,
social_health_insurance_schemes = hf121,
compulsory_private_insurance_schemes = hf122,
unspecified_compulsory_contributory_insurance = hf12nec,
compulsory_medical_savings_accounts = hf13,
unspecified_govt_compulsory_schemes = hf1nec,
voluntary_health_payment_schemes = hf2,
voluntary_health_insurance_schemes = hf21,
non_profit_institutions_serving_households = hf22,
enterprise_financing_schemes = hf23,
unspecified_voluntary_payment_schemes = hf2nec,
household_out_of_pocket_payments = hf3,
rest_of_world_financing_schemes = hf4
) %>%
select(
country,
region,
year,
income,
total_expenditure,
current_health_expenditure_pct_gdp,
gdp_per_capita_usd,
primary_health_care_expenditure_pct_che,
govt_and_compulsory_financing,
government_schemes,
compulsory_contributory_health_insurance,
social_health_insurance_schemes,
compulsory_private_insurance_schemes,
unspecified_compulsory_contributory_insurance,
compulsory_medical_savings_accounts,
unspecified_govt_compulsory_schemes,
voluntary_health_payment_schemes,
voluntary_health_insurance_schemes,
non_profit_institutions_serving_households,
enterprise_financing_schemes,
unspecified_voluntary_payment_schemes,
household_out_of_pocket_payments,
rest_of_world_financing_schemes
)
head(OOP_subset)
## # A tibble: 6 × 23
## country region year income total_expenditure current_health_expenditu…¹
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Algeria AFR 2000 Lower-middle 143870. 3.49
## 2 Algeria AFR 2001 Lower-middle 162231. 3.84
## 3 Algeria AFR 2002 Lower-middle 168702. 3.73
## 4 Algeria AFR 2003 Lower-middle 189137. 3.60
## 5 Algeria AFR 2004 Lower-middle 217929. 3.54
## 6 Algeria AFR 2005 Lower-middle 244643. 3.24
## # ℹ abbreviated name: ¹current_health_expenditure_pct_gdp
## # ℹ 17 more variables: gdp_per_capita_usd <dbl>,
## # primary_health_care_expenditure_pct_che <dbl>,
## # govt_and_compulsory_financing <dbl>, government_schemes <dbl>,
## # compulsory_contributory_health_insurance <dbl>,
## # social_health_insurance_schemes <dbl>,
## # compulsory_private_insurance_schemes <dbl>, …
As it currently stands, the variables we have are not very informative. We will first transform them into percentages. We consider the total sending to be the sum of out of pocket payments, contribution from social insurance, compulsory prepayment, and voluntary prepayment. We will also do some minor pre-cleaning of the data.
head(OOP_subset)
## # A tibble: 6 × 23
## country region year income total_expenditure current_health_expenditu…¹
## <chr> <chr> <dbl> <chr> <dbl> <dbl>
## 1 Algeria AFR 2000 Lower-middle 143870. 3.49
## 2 Algeria AFR 2001 Lower-middle 162231. 3.84
## 3 Algeria AFR 2002 Lower-middle 168702. 3.73
## 4 Algeria AFR 2003 Lower-middle 189137. 3.60
## 5 Algeria AFR 2004 Lower-middle 217929. 3.54
## 6 Algeria AFR 2005 Lower-middle 244643. 3.24
## # ℹ abbreviated name: ¹current_health_expenditure_pct_gdp
## # ℹ 17 more variables: gdp_per_capita_usd <dbl>,
## # primary_health_care_expenditure_pct_che <dbl>,
## # govt_and_compulsory_financing <dbl>, government_schemes <dbl>,
## # compulsory_contributory_health_insurance <dbl>,
## # social_health_insurance_schemes <dbl>,
## # compulsory_private_insurance_schemes <dbl>, …
# Calculate percentages of financing schemes relative to total health expenditure
OOP_subset <- OOP_subset %>%
mutate(
pct_govt_intervention = (govt_and_compulsory_financing / total_expenditure) * 100,
pct_voluntary_protection = (voluntary_health_payment_schemes / total_expenditure) * 100,
pct_household_out_of_pocket = (household_out_of_pocket_payments / total_expenditure) * 100
)
str(OOP_subset)
## tibble [4,244 × 26] (S3: tbl_df/tbl/data.frame)
## $ country : chr [1:4244] "Algeria" "Algeria" "Algeria" "Algeria" ...
## $ region : chr [1:4244] "AFR" "AFR" "AFR" "AFR" ...
## $ year : num [1:4244] 2000 2001 2002 2003 2004 ...
## $ income : chr [1:4244] "Lower-middle" "Lower-middle" "Lower-middle" "Lower-middle" ...
## $ total_expenditure : num [1:4244] 143870 162231 168702 189137 217929 ...
## $ current_health_expenditure_pct_gdp : num [1:4244] 3.49 3.84 3.73 3.6 3.54 ...
## $ gdp_per_capita_usd : num [1:4244] 1780 1755 1795 2117 2625 ...
## $ primary_health_care_expenditure_pct_che : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ govt_and_compulsory_financing : num [1:4244] 103539 123669 127002 145082 155532 ...
## $ government_schemes : num [1:4244] 66056 81878 83246 95931 102717 ...
## $ compulsory_contributory_health_insurance : num [1:4244] 37483 41791 43756 49152 52815 ...
## $ social_health_insurance_schemes : num [1:4244] 37483 41791 43756 49152 52815 ...
## $ compulsory_private_insurance_schemes : num [1:4244] 0 0 0 0 0 0 0 0 0 0 ...
## $ unspecified_compulsory_contributory_insurance: num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ compulsory_medical_savings_accounts : num [1:4244] 0 0 0 0 0 0 0 0 0 0 ...
## $ unspecified_govt_compulsory_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ voluntary_health_payment_schemes : num [1:4244] 3221 3409 3700 4055 5530 ...
## $ voluntary_health_insurance_schemes : num [1:4244] 1177 1375 1610 1855 3150 ...
## $ non_profit_institutions_serving_households : num [1:4244] 80 85 90.5 100 110 ...
## $ enterprise_financing_schemes : num [1:4244] 1964 1949 2000 2100 2270 ...
## $ unspecified_voluntary_payment_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ household_out_of_pocket_payments : num [1:4244] 37111 35153 38000 40000 56867 ...
## $ rest_of_world_financing_schemes : num [1:4244] NA NA NA NA NA NA NA NA NA NA ...
## $ pct_govt_intervention : num [1:4244] 72 76.2 75.3 76.7 71.4 ...
## $ pct_voluntary_protection : num [1:4244] 2.24 2.1 2.19 2.14 2.54 ...
## $ pct_household_out_of_pocket : num [1:4244] 25.8 21.7 22.5 21.1 26.1 ...
We want to ensure that there are no missing values in our primary variables. It appears from below that our data in fairly clean.
# Remove rows with missing values in key variables
OOP_subset <- OOP_subset %>%
filter(
!is.na(pct_govt_intervention) &
!is.na(pct_voluntary_protection) &
!is.na(pct_household_out_of_pocket)
)
We will begin by exploring the basic distribution of our data.
ggplot(OOP_subset, aes(x = pct_govt_intervention)) +
geom_histogram(binwidth = 5, fill = 'blue', color = 'black') +
labs(
title = 'Distribution of Government Based Schemes',
x = 'Percentage (%)',
y = 'Frequency'
)
ggplot(OOP_subset, aes(x = pct_household_out_of_pocket)) +
geom_histogram(binwidth = 5, fill = 'red', color = 'black') +
labs(
title = 'Distribution of Household Out-of-Pocket Payments',
x = 'Percentage (%)',
y = 'Frequency'
)
ggplot(OOP_subset, aes(x = pct_voluntary_protection)) +
geom_histogram(binwidth = 5, fill = 'green', color = 'black') +
labs(
title = 'Distribution of Voluntary Payments',
x = 'Percentage (%)',
y = 'Frequency'
)
Now is a good time to explain what exactly each one of these variables really mean.
Firstly, the distribution of government-based schemes shows how much of total healthcare spending is funded through the government. A high percentage here indicates that healthcare programs are more tax-funded as opposed to individually funded.
Secondly, the distribution of out of pocket payments represents the total health care expenditure that is paid directly at the point of service. This means that premiums for insurance or the tax dollars that are paid are not included in this category.
Finally, the distribution of voluntary payments shows the relative amount of health expenditures coming from sources that people opt into. The easiest example is supplemental health insurance where people can choose what plan they want.
Now we will look into some descriptive statistics by year.
# Calculate basic statistical metrics for the new variables grouped by year
yearly_stats <- OOP_subset %>%
group_by(year) %>%
summarise(
mean_govt_intervention = mean(pct_govt_intervention, na.rm = TRUE),
median_govt_intervention = median(pct_govt_intervention, na.rm = TRUE),
sd_govt_intervention = sd(pct_govt_intervention, na.rm = TRUE),
mean_out_of_pocket = mean(pct_household_out_of_pocket, na.rm = TRUE),
median_out_of_pocket = median(pct_household_out_of_pocket, na.rm = TRUE),
sd_out_of_pocket = sd(pct_household_out_of_pocket, na.rm = TRUE),
mean_voluntary_protection = mean(pct_voluntary_protection, na.rm = TRUE),
median_voluntary_protection = median(pct_voluntary_protection, na.rm = TRUE),
sd_voluntary_protection = sd(pct_voluntary_protection, na.rm = TRUE)
)
head(yearly_stats)
## # A tibble: 6 × 10
## year mean_govt_intervention median_govt_intervention sd_govt_intervention
## <dbl> <dbl> <dbl> <dbl>
## 1 2000 52.8 51.1 22.0
## 2 2001 52.6 51.4 21.6
## 3 2002 52.4 52.0 21.9
## 4 2003 52.7 52.3 21.7
## 5 2004 52.2 52.7 21.3
## 6 2005 52.5 52.9 20.7
## # ℹ 6 more variables: mean_out_of_pocket <dbl>, median_out_of_pocket <dbl>,
## # sd_out_of_pocket <dbl>, mean_voluntary_protection <dbl>,
## # median_voluntary_protection <dbl>, sd_voluntary_protection <dbl>
Looking at the sample size for each datapoint, we can see that there is a significant drop at the final time point. This final time point is 2022 with only 20 samples. Thus, we shall remove 2022 from downstream analysis.
# Calculate the number of observations per year, removing rows with NAs
obs_count_per_year <- OOP_subset %>%
group_by(year) %>%
summarise(observation_count = n())
# Plot the number of observations over time
ggplot(obs_count_per_year, aes(x = year, y = observation_count)) +
geom_line(color = "blue", size = 1) +
geom_point(color = "red", size = 2) +
labs(
title = "Number of Observations Over Time",
x = "Year",
y = "Number of Observations"
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Utilizing these yearly aggregated statistics, let us observe how the
following have changed over time for each one of our variables:
How the median has changed over time. This will indicate relative trends for each of our variables.
How the coefficient of variation has changed over time. This will indicate if countries are either conforming to similar policies or if, instead, they are becoming more disparate.
yearly_stats <- yearly_stats %>%
filter(year != 2022)
ggplot(yearly_stats, aes(x = year)) +
geom_line(aes(y = median_govt_intervention, color = "Govt Intervention"), size = 1) +
geom_line(aes(y = median_out_of_pocket, color = "Out-of-Pocket"), size = 1) +
geom_line(aes(y = median_voluntary_protection, color = "Voluntary Protection"), size = 1) +
labs(
title = "Median of Health Financing Variables Over Time",
x = "Year",
y = "Median (%)",
color = "Variable"
)
yearly_stats$cv_govt_intervention <- yearly_stats$sd_govt_intervention / yearly_stats$mean_govt_intervention
yearly_stats$cv_out_of_pocket <- yearly_stats$sd_out_of_pocket / yearly_stats$mean_out_of_pocket
yearly_stats$cv_voluntary_protection <- yearly_stats$sd_voluntary_protection / yearly_stats$mean_voluntary_protection
ggplot(yearly_stats, aes(x = year)) +
geom_line(aes(y = cv_govt_intervention, color = "Govt Intervention"), size = 1) +
geom_line(aes(y = cv_out_of_pocket, color = "Out-of-Pocket"), size = 1) +
geom_line(aes(y = cv_voluntary_protection, color = "Voluntary Protection"), size = 1) +
labs(
title = "Coefficient of Variation of Health Financing Variables Over Time",
x = "Year",
y = "Coefficient of Variation",
color = "Variable"
)
For the final portion of this analysis, we will look at the relationship between percent spending on government intervention and out of pocket costs. We will also investigate how voluntary protection can protect consumers.
# Scatter plot of Social Health Insurance vs. Out-of-Pocket Payments
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Out-of-Pocket Payments vs. Government Intervention',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Out-of-Pocket Payments (% of Total Health Expenditure)'
)
## `geom_smooth()` using formula = 'y ~ x'
# Scatter plot of Social Health Insurance vs. Out-of-Pocket Payments
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_voluntary_protection, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Voluntary Protection Contribution vs. Government Intervention Contribution',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Voluntary Protection Contribution (% of Total Health Expenditure)'
)
## `geom_smooth()` using formula = 'y ~ x'
The analysis above has opened up potentially interesting question for further investigation. Trivially, we have shown that as the magnitude of government spending on healthcare decreases, the amount of out of pocket spending on the behalf of the individual increases. However, we notice that there are some poorer countries that are able to have less government support, but still have out of pocket costs that are comparable to those of higher income countries. Investigating why this is the case may prove fruitful. Furthermore, it seems that some low income countries are able to supplement low government intervention with voluntary protection plans. However, there are others that are unable to. Identifying what characteristics distinguish these two types of countries may also prove to be an interesting further line of inquiry.
# Investigation of how government spending in the previous period impacts
# label minimum year
labeler <- OOP_subset %>%
group_by(country) %>%
summarise(min(year))
colnames(labeler) <- c("country", "year")
labeler$first_year <- TRUE
OOP_subset <- OOP_subset %>%
left_join(labeler, by = c("country", "year"))
gdp_joiner <- GHED_data %>%
select(country, year, gdp)
OOP_subset <- OOP_subset %>%
left_join(gdp_joiner, by = c("country", "year"))
OOP_subset_sorted <- OOP_subset %>%
arrange(country, year)
# lead since we want prev
OOP_subset_sorted$prev_gov_int <- lead(OOP_subset_sorted$pct_govt_intervention)
# compute change in gov int
OOP_subset_sorted$gov_int_change <- OOP_subset_sorted$pct_govt_intervention - OOP_subset_sorted$prev_gov_int
OOP_subset_sorted$prev_oop <- lead(OOP_subset_sorted$household_out_of_pocket_payments)
OOP_subset_sorted$change_oop <- OOP_subset_sorted$household_out_of_pocket_payments - OOP_subset_sorted$prev_oop
OOP_subset_sorted$gdp_oop_ratio <- OOP_subset_sorted$household_out_of_pocket_payments / OOP_subset_sorted$gdp
OOP_subset_sorted$gdp_oop_ratio_difference <- OOP_subset_sorted$change_oop / OOP_subset_sorted$gdp
OOP_subset_sorted <- OOP_subset_sorted %>%
mutate(first_year = if_else(is.na(first_year), FALSE, first_year)) %>%
filter(!first_year)
OOP_subset_sorted %>%
ggplot(aes(x = gov_int_change, y = gdp_oop_ratio_difference)) +
geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
# compute the following
# P(OOP Decreases | Gov Int Increases)
OOP_subset_sorted$pos_gov_change <- OOP_subset_sorted$gov_int_change > 0
OOP_subset_sorted$pos_oop_change <- OOP_subset_sorted$gdp_oop_ratio_difference > 0
proportions <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
group_by(pos_gov_change) %>%
summarize(
total = n(),
neg_oop_count = sum(neg_oop_change, na.rm = TRUE),
neg_oop_proportion = mean(neg_oop_change, na.rm = TRUE)
) %>%
drop_na()
contingency_table <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
count(pos_gov_change, neg_oop_change) %>%
drop_na() %>%
tidyr::pivot_wider(names_from = neg_oop_change, values_from = n, values_fill = 0)
chisq_test <- chisq.test(as.matrix(contingency_table[,-1]))
OOP_subset_sorted %>%
ggplot(aes(x = pos_gov_change, y = gdp_oop_ratio_difference)) +
geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
proportions_yr <- OOP_subset_sorted %>%
mutate(pos_gov_change = gov_int_change > 0,
neg_oop_change = gdp_oop_ratio_difference < 0) %>%
group_by(pos_gov_change, year) %>%
summarize(
total = n(),
neg_oop_count = sum(neg_oop_change, na.rm = TRUE),
neg_oop_proportion = mean(neg_oop_change, na.rm = TRUE)
) %>%
drop_na()
## `summarise()` has grouped output by 'pos_gov_change'. You can override using
## the `.groups` argument.
proportions_yr %>%
ggplot(aes(x = year, y = neg_oop_proportion, color = pos_gov_change)) +
geom_point() + labs(x = "Year", y = "Negative OOP Change Proportion", color = "Positive Change Gov Spending")
# Perform the regression analysis
model <- lm(pct_household_out_of_pocket ~ pct_govt_intervention, data = OOP_subset)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Out-of-Pocket Payments vs. Government Intervention',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Out-of-Pocket Payments (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset$pct_govt_intervention) - 10,
y = max(OOP_subset$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
# Perform the regression analysis
model <- lm(pct_voluntary_protection ~ pct_govt_intervention, data = OOP_subset)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset, aes(x = pct_govt_intervention, y = pct_voluntary_protection, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Voluntary Protection Contribution vs. Government Intervention Contribution',
x = 'Government Intervention (% of Total Health Expenditure)',
y = 'Voluntary Protection Contribution (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset$pct_govt_intervention) - 10,
y = max(OOP_subset$pct_voluntary_protection) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
foreign_sources <- GHED_data %>%
select(country, year, fs7)
OOP_subset <- OOP_subset %>%
left_join(foreign_sources, by = c("year", "country"))
OOP_subset$ratio_outsourcing <- (OOP_subset$fs7 / OOP_subset$total_expenditure) * 100
OOP_subset_lowinc <- OOP_subset %>%
filter(income %in% c("Low", "Lower-middle")) %>%
filter(!is.na(ratio_outsourcing))
# Perform the regression analysis
model <- lm(pct_govt_intervention ~ ratio_outsourcing, data = OOP_subset_lowinc)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset_lowinc, aes(x = ratio_outsourcing, y = pct_govt_intervention, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Percent Government Intervention vs. Magnitude of Outsourcing',
x = 'Outsourcing Magnitude (% of Total Health Expenditure)',
y = 'Government Intervention (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset_lowinc$ratio_outsourcing) - 10,
y = max(OOP_subset_lowinc$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
OOP_subset_lowinc <- OOP_subset %>%
filter(income %in% c("Low", "Lower-middle")) %>%
filter(!is.na(ratio_outsourcing))
# Perform the regression analysis
model <- lm(pct_household_out_of_pocket ~ ratio_outsourcing, data = OOP_subset_lowinc)
# Extract coefficients
coeffs <- coef(model)
intercept <- round(coeffs[1], 2)
slope <- round(coeffs[2], 2)
# Format the regression equation as a string
regression_eq <- paste0("y = ", slope, "x + ", intercept)
# Create the plot with annotation
ggplot(OOP_subset_lowinc, aes(x = ratio_outsourcing, y = pct_household_out_of_pocket, color = income)) +
geom_point() +
geom_smooth(method = 'lm', color = 'blue') +
labs(
title = 'Percentage of Household OOP Costs vs. Magnitude of Outsourcing',
x = 'Magnitude of Outsourcing (% of Total Health Expenditure)',
y = 'OOP Costs (% of Total Health Expenditure)'
) +
annotate("text", x = max(OOP_subset_lowinc$ratio_outsourcing) - 10,
y = max(OOP_subset_lowinc$pct_household_out_of_pocket) - 10,
label = regression_eq,
color = "black",
size = 5,
hjust = 1)
## `geom_smooth()` using formula = 'y ~ x'
Vaccine and specific healthcare spending data is hard to come across in this dataset. There are many missing values which makes it difficult to make meaningful comparisons. However, it is interesting to note that we are able to observe how different regions have different predominant diseases.
We then looked into how different country types allocate funds towards healthcare based on region and income level. There were stark differences and an abundance of data present. We were also able to observe outliers which means that there are likely other factors that influence how money is allocated towards healthcare.
We also looked specifically towards how insurance is utilized to offset out of pocket costs in countries with low government healthcare spending. Some low income countries are able to brace healthcare costs utilizing private insurance. However, others are unable to do so. Exploring why this is the case may also prove to be a fruitful line of inquiry.
There are also a few different questions that may be worthwhile to explore. A key question is why some low-income countries exhibit lower out-of-pocket costs despite minimal government spending. Investigating the roles of external funding, voluntary contributions, and cultural or policy factors could help identify replicable strategies for reducing financial burdens in other low-resource settings. This would require merging additional data with our current data. For example, categorizing policy changes would be important in trying to identify effective operating procedures when providing aid to countries. This would be an excellent candidate for future work.
Citations:
World Health Organization (WHO). (2023). Global Health Expenditure Database. World Health Organization. Retrieved from https://apps.who.int/nha/database.
Dieleman, J. L., & Hanlon, M. (2013). MEASURING THE DISPLACEMENT AND REPLACEMENT OF GOVERNMENT HEALTH EXPENDITURE. Health Economics, 23(2), 129–140. https://doi.org/10.1002/hec.3016
Hartman, M., Martin, A. B., Washington, B., Catlin, A., & The National Health Expenditure Accounts Team. (2022). National health care spending in 2020: Growth driven by federal spending in response to the COVID-19 pandemic. Health Affairs, 41(1), 13–25. https://doi.org/10.1377/hlthaff.2021.01763
Morrissey, O. (2015). Aid and government fiscal behavior: Assessing recent evidence. World Development, 69, 98–105. https://doi.org/10.1016/j.worlddev.2013.12.008
Piatti-Fünfkirchen, M., Lindelow, M., & Yoo, K. (2018). What are governments spending on health in east and southern africa? Health Systems & Reform, 4(4), 284–299. https://doi.org/10.1080/23288604.2018.1510287
Adam Wagstaff, Patrick Eozenou, Marc Smitz, Out-of-Pocket Expenditures on Health: A Global Stocktake, The World Bank Research Observer, Volume 35, Issue 2, August 2020, Pages 123–157, https://doi.org/10.1093/wbro/lkz009
Past, present, and future of global health financing: a review of development assistance, government, out-of-pocket, and other private spending on health for 195 countries, 1995–2050
Chang, Angela Y. et al. The Lancet, Volume 393, Issue 10187, 2233 - 2260, https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(19)30841-4/fulltext